2021-09-29
Each research question draws its own challenges which are unique in themselves. But:
Data ➡️ Cleaning ➡️ Fitness evaluation ➡️ Analysis
Mirroreum offers more computational resources* and a standardized environment
The SBDI4R package enables the R community to directly access data and resources hosted by SBDI.
Please refer to the package documentation for details on how to install it. Once installed the SBDI4R package must be loaded for each new R session:
library(SBDI4R)
Various aspects of the SBDI4R package can be customized.
caching
download reason
SBDI4R can cache most results to local files. This means that if the same code is run multiple times, the second and subsequent iterations will be faster. This will also reduce load on the web servers. By default, this caching is session-based, meaning that the local files are stored in a temporary directory that is automatically deleted when the R session is ended. This behaviour can be altered so that caching is permanent, by setting the caching directory to a non-temporary location. For example, under Windows, use something like:
sbdi_config(cache_directory = file.path("c:","mydata","sbdi_cache")) ## Windows
or for Linux:
sbdi_config(cache_directory = "~/mydata/sbdi_cache") ## Linux
Note that this directory must exist (you need to create it yourself).
Each download request to SBDI servers is also accompanied by an “e-mail address” string that identifies the user making the request. You will need to provide an email address registered with the SBDI. You can create an account here. Once an email is registered with the SBDI, it should be stored in the config:
sbdi_config(email="your.registered@emailaddress.com")
Else you can provide this e-mail address as a parameter directly to each call of the function occurrences().
SBDI requires that you provide a reason when downloading occurrence data (via the SBDI4R occurrences() function). You can provide this as a parameter directly to each call of occurrences(), or you can set it once per session using:
sbdi_config(download_reason_id = "your_reason_id")
(See sbdi_reasons() for valid download reasons, e.g. * 3 for “education”, * 7 for “ecological research”, * 8 for “systematic research/taxonomy”, * 10 for “testing”)
NO other personal identification information is sent. You can see all configuration settings, including the the user-agent string that is being used, with the command:
sbdi_config()
If you make a request that returns an empty result set (e.g. an un-matched name), by default you will simply get an empty data structure returned to you without any special notification. If you would like to be warned about empty result sets, you can use:
sbdi_config(warn_on_empty=TRUE)
Some additional packages are needed for these examples. Install them if necessary with the following script:
to_install <- c("colorRamps", "cowplot","dplyr","ggplot2",
"leaflet", "maps", "mapdata", "maptools",
"remotes","sf", "tidyr", "xts")
to_install <- to_install[!sapply(to_install,
requireNamespace,
quietly=TRUE)]
if(length(to_install)>0)
install.packages(to_install,
repos="http://cran.us.r-project.org")
remotes::install_github("Greensway/BIRDS")
The Swedish Electrofishing Registry - SERS (Department of Aquatic Resources, SLU Aqua).
2.8 M observations starting in the 1950’s.
fq_str <- pick_filter("resource")
Follow the instructions. Your choices here would have been ‘in3’ ▶️ ‘dr10’ (data resource 10 = SERS). Your variable fq_str will now contain a string ‘data_resource_uid:dr10’.
y1 <- 2008
y2 <- 2012
fq_str <- c(fq_str, paste0("year:[", y1, " TO ", y2,"]"))
# Note the square brackets are hard limits
Both filter strings (for data resource and for time period) will be treated as AND factors. For references on how to use the filters see the SBDI APIs documentation.
Using the function occurrences() we can now query for the observations fulfilling our filter. If you haven’t specified your email and the download reason in the sbdi_config() before, you need to pass this here.
xf <- occurrences(fq = fq_str) # Remove what is not a species xf$data <- xf$data[xf$data$rank == "species",] # Simply summarise all records by data source table(xf$data$dataResourceName)
You can quickly plot all the observations as a PDF file with the function ocurrence_plot(), one page per species:
Note that the plot is saved to a .pdf file in the current working directory. You can find that with getwd().
occurrences_plot(xf, "output/obsPlot.pdf",
grouped=FALSE,
taxon_level="species",
pch='.')
There are many other ways of producing spatial plots in R. The leaflet package provides a simple method of producing browser-based maps with panning, zooming, and background layers:
library(leaflet)
# drop any records with missing lat/lon values
xfl <- xf$data[!is.na(xf$data$longitude) | !is.na(xf$data$latitude),]
marker_colour <- rep("#d95f02", nrow(xfl))
# blank map, with imagery background
leaflet() |>
addProviderTiles("Esri.WorldImagery") |>
# add markers
addCircleMarkers(xfl$longitude, xfl$latitude,
radius = 1,
fillOpacity =.5,
opacity = 1,
col=marker_colour,
clusterOptions = markerClusterOptions())
A quick summary over the years reveals a drop in number of records over time.
hist(xf$data$year, breaks = seq(y1, y2), xlab = "Year")
In the same way we can summarise the number of observations for each species, by common or scientific name.
sppTab <- table(xf$data$commonName) sppDF <- as.data.frame(sppTab) colnames(sppDF)[1] <- "species" head(sppDF)
## species Freq ## 1 61 ## 2 Alpine bullhead 4615 ## 3 American burbot 7081 ## 4 Aral asp 6 ## 5 Arctic char 46 ## 6 aurora trout 856
sppTab <- table(xf$data$scientificName) sppDF <- as.data.frame(sppTab) colnames(sppDF)[1] <- "species" head(sppDF)
## species Freq ## 1 Abramis brama (Linnaeus, 1758) 61 ## 2 Alburnus alburnus (Linnaeus, 1758) 660 ## 3 Anguilla anguilla (Linnaeus, 1758) 2140 ## 4 Astacus astacus (Linnaeus, 1758) 618 ## 5 Barbatula barbatula (Linnaeus, 1758) 620 ## 6 Blicca bjoerkna (Linnaeus, 1758) 74
Perhaps, you want to send this table as a .CSV file to a colleague. Save the table:
write.csv(sppDF, "output/SERS_species_summary.csv") # NOTE: again this will be saved on your working directory
Let’s now ask: How does the species richness vary across Sweden?
library(sf)
# load some shapes over Sweden's political borders
data("swe_wgs84", package="SBDI4R", envir=environment())
# a standard 50 km grid
data("Sweden_Grid_50km_Wgs84", package="SBDI4R", envir=environment())
grid <- Sweden_Grid_50km_Wgs84
obs <- st_as_sf(as.data.frame(xf$data),
coords = c("longitude","latitude"),
crs = st_crs(4326))
# overlay the occurrence data with the grid
ObsInGridListID <- st_intersects(grid, obs)
ObsInGridList <- lapply(ObsInGridListID, function(x) st_drop_geometry(obs[x,]))
wNonEmpty <- unname( which( unlist(lapply(ObsInGridList, nrow)) != 0) )
The result ObsInGridList is a list object with a subset of the data for each grid cell. Now summarise occurrences within grid cells:
# apply a summary over the grid cells
nCells <- length(ObsInGridList)
res <- data.frame("nObs"=as.numeric(rep(NA,nCells)),
"nYears"=as.numeric(rep(NA,nCells)),
"nSpp"=as.numeric(rep(NA,nCells)),
row.names = row.names(grid),
stringsAsFactors = FALSE)
cols2use <- c("scientificName", "year")
dataRes <- lapply(ObsInGridList[wNonEmpty],
function(x){
x <- x[,cols2use]
return(c("nObs" = length(x[,"scientificName"]),
"nYears" = length(unique(x[,"year"])),
"nSpp" = length(unique(x[,"scientificName"]))
)
)
}
)
dataRes <- as.data.frame(dplyr::bind_rows(dataRes, .id = "gridID"))
res[wNonEmpty,] <- dataRes[,-1]
resSf <- st_as_sf(data.frame(res, st_geometry(grid)))
And finally plot the grid summary as a map:
We may now ask whether species richness varies across latitude. So we go further by arranging the observations by latitude:
library(dplyr)
library(tidyr)
xgridded <- xf$data |>
mutate(longitude = round(longitude * 4)/4,
latitude = round(latitude * 4)/4) |>
group_by(longitude,latitude) |>
## subset to vars of interest
select(longitude, latitude, species) |>
## take one row per cell per species (presence)
distinct() |>
## calculate species richness
mutate(richness=n()) |>
## convert to wide format (sites by species)
mutate(present=1) |>
do(tidyr::pivot_wider(data=.,
names_from=species,
values_from=present,
values_fill=0)) |>
ungroup()
## where no species was present, it shows NA. Here we convert these to 0
sppcols <- setdiff(names(xgridded),
c("longitude", "latitude", "richness"))
xgridded <- xgridded |>
mutate_at(sppcols, function(z) ifelse(is.na(z), 0, z))
And plot it accordingly:
library(ggplot2)
ggplot(xgridded, aes(latitude, richness)) +
labs(x = "Latitude (º)",
y = "Species richness") +
lims(y = c(0,20)) +
geom_point() +
theme_bw()
In this example we are interested in exploring opportunistically collected data from the Swedish citizen science species observation portal - Artportalen.
To begin with, we want be sure there is an unequivocal way to find the species within the order Odonata (dragonflies) and nothing else, so let’s search for “odonata”:
sx <- search_fulltext("odonata")
## [1] "https://species.biodiversitydata.se/ws/search.json?q=odonata&fq=idxtype%3ATAXON"
sx$data[, c("guid", "scientificName", "rank", "occurrenceCount")]
## guid scientificName rank occurrenceCount ## 1 9829523 Odonata associated gemycircularvirus 1 species 0 ## 2 10072832 Odonata associated gemycircularvirus 2 species 0 ## 3 8062407 Bdellodes odonata Wallace & Mahon, 1976 species 0 ## 4 789 Odonata order 14421 ## 5 7367071 Ramalina fastigiata var. odonata Hue variety 0
Let’s refine the search. To know which search fields we can use to filter the search we use the function sbdi_fields(fields_type = "general"). The search field we are looking for is “order_s”.
sx <- search_fulltext(fq="order_s:Odonata", page_size = 10)
## [1] "https://species.biodiversitydata.se/ws/search.json?fq=order_s%3AOdonata&fq=idxtype%3ATAXON&pageSize=10"
sx$data[, c("guid", "scientificName", "rank", "occurrenceCount")]
## guid scientificName rank occurrenceCount ## 1 1429753 Gomphomacromia Brauer, 1864 genus 0 ## 2 1426725 Austropetalia Tillyard, 1916 genus 0 ## 3 4799335 Sogjutella Pritykina, 1980 genus 0 ## 4 4302686 Neuragrion Karsch, 1891 genus 0 ## 5 4799353 Xamenophlebia Pritykina, 1981 genus 0 ## 6 1429769 Lauromacromia Geijskes, 1970 genus 1 ## 7 1428195 Sympetrum Newman, 1833 genus 613 ## 8 4798599 Corduliochlora Marinov & Seidenbusch, 2007 genus 0 ## 9 1423625 Torrenticnemis Lieftinck, 1949 genus 0 ## 10 1423468 Cyanallagma Kennedy, 1920 genus 0
Now we can download the taxonomic data (note that the search is case-sensitive):
tx <- taxinfo_download("order_s:Odonata",
fields = c("guid", "order_s","genus_s", "specificEpithet_s",
"scientificName", "canonicalName_s", "rank"),
verbose = FALSE)
## restrict to species and not hybrids
tx <- tx[tx$rank == "species" & tx$genusS != "",]
You can save the tx object as the complete species list for later use.
We start by searching for the data resource we are interested in using the function pick_filter(). This is an interactive query guiding you through the many resources available to filtering your query (data resources, spatial layers, and curated species lists).
# follow the instructions
fq_str <- pick_filter("resource")
Follow the instructions. Your choices here would have been “in3” ▶️ “dr5”. Your variable fq_str will now contain a string “data_resource_uid:dr5”.
We only want to look at data from year 2000 to 2010:
y1 <- 2000
y2 <- 2010
fq_str <- c(fq_str, paste0("year:[", y1, " TO ", y2,"]"))
# Note the square brackets are hard limits
We also want to filter spatially for Southern Sweden (Götaland).
SBDI APIs take as search input polygons in the so-called WKT Well Known Text format.
data("swe", package = "SBDI4R")
wGotaland <- swe$Counties$LnNamn %in% c("Blekinge", "Gotlands", "Hallands", "Jönköpings", "Kalmar",
"Kronobergs", "Östergötlands", "Skåne", "Västra Götalands")
gotaland_c <- swe$Counties[wGotaland,]
Let’s construct the WKT string:
# transform the CRS gotaland_c <- st_transform(gotaland_c, crs = st_crs(4326)) # disolve the counties into one polygon gotaland <- st_union(gotaland_c) # create a convex hull of the polygon gotaland_ch <- st_convex_hull(gotaland) # cast it as MULTIPOLYGON as this is what SBDIs API need gotaland_ch <- st_cast(gotaland_ch, to = "MULTIPOLYGON") # create WKT string wkt <- st_as_text(gotaland_ch)
The WKT string then looks like this:
## [1] "MULTIPOLYGON (((13.33575 55.34003, 12.81633 55.38594, 11.25342 58.35786, 11.13161 58.90942, 11.13145 59.01184, 11.21142 59.0897, 11.31566 59.11651, 11.82032 59.23553, 11.94833 59.26237, 12.06197 59.27159, 12.23104 59.27357, 15.79383 59.03876, 15.84306 59.02498, 19.2889 57.99043, 19.3058 57.96888, 18.90037 57.44014, 18.86704 57.39753, 18.3725 57.00678, 18.30044 56.9528, 16.40805 56.20229, 14.19057 55.38557, 13.33575 55.34003)))"
Next, we download the observations using the command occurrences().
xf <- SBDI4R::occurrences(taxon = "order:Odonata",
fq = fq_str,
wkt = wkt,
extra = "collector")
Before we can use the observation records we need to know if the observation effort (sampling effort) has varied over time and in space. we can
We can approximate observation effort from the data by defining field visits i.e. occasions at which an observer has sampled observations using the package BIRDS. Additionally we want the data to be summarized over a grid of 25 km. Please refer to the BIRDS package documentation for more detail.
remotes::install_github("Greensway/BIRDS")
library(BIRDS)
OB <- organiseBirds(xf$data, sppCol = "species" ,
# We only want observations identified at the species level
taxonRankCol = "rank", taxonRank = "species",
# the visits are defined by collector and named locality
idCols = c("locality", "collector"),
timeCols = c("year", "month", "day"),
xyCols =c("longitude","latitude") )
We don’t need the whole grid, just the piece that overlaps our searching polygon
wInt <- unlist(st_intersects(gotaland, Sweden_Grid_25km_Wgs84)) gotaland_grid25 <- Sweden_Grid_25km_Wgs84[wInt,]
SB <- summariseBirds(OB,
grid = gotaland_grid25,
spillOver = "unique")
Once summarised, we can see over space and for a few selected years how the number of observations is distributed
We now want to use the number of field visits as the measure for sampling effort:
vis <- ggplot(data = SB$spatial, aes( fill = nVis)) +
geom_sf() +
ggtitle("Number of visits")
spp <- ggplot(data = SB$spatial ,aes( fill = nSpp)) +
geom_sf() +
ggtitle("Number of species")
We see that SB contains an element called SB$temporal that contains a daily time series with time-specific rows when there is information. xts also supports day time, but dating below day resolution is not yet implemented in the BIRDS package.
sb.xts <- SB$temporal head(sb.xts, 5)
## nObs nVis nSpp ## 2000-03-24 1 1 1 ## 2000-04-05 4 3 3 ## 2000-04-06 11 6 3 ## 2000-04-10 1 1 1 ## 2000-04-12 3 3 1
Sub-setting is convenient in xts as you can do it with its dates and with a / for a range of dates.
sb.xts["2010-09-07"] #a specific day sb.xts["2010-09-01/2010-09-15"] #for a period sb.xts["2010-09"] #a specific month
The package xts has several tools for converting to different time periods. Here we use apply.monthly to obtain the total number of observations and visits per month. Read more in ?plot.xts.
library(xts)
obs.m <- apply.monthly(sb.xts$nObs, "sum", na.rm = TRUE)
vis.m <- apply.monthly(sb.xts$nVis, "sum", na.rm = TRUE)
plot(obs.m, col = "darkblue", grid.ticks.on = "month",
major.ticks = "year", grid.col = "lightgrey",
main = "Total number of daily observations and visits per month")
lines(vis.m, col = "orange", lwd=2, on=1)
We can now look at some particular species and ask whether those have changed in occurrence over time:
speciesSummary(SB)
speciesSummary(SB)[1:20,1:4]
## species nCells nObs nVis ## 1 Aeshna affinis 3 32 27 ## 2 Aeshna caerulea 5 12 12 ## 3 Aeshna cyanea 118 859 825 ## 4 Aeshna grandis 145 1691 1661 ## 5 Aeshna isoceles 14 102 99 ## 6 Aeshna juncea 93 334 321 ## 7 Aeshna mixta 72 645 614 ## 8 Aeshna serrata 13 39 38 ## 9 Aeshna subarctica 35 108 99 ## 10 Aeshna viridis 13 40 35 ## 11 Anax imperator 43 514 474 ## 12 Anax parthenope 2 5 5 ## 13 Brachytron pratense 82 434 421 ## 14 Calopteryx splendens 101 681 627 ## 15 Calopteryx virgo 122 1058 1004 ## 16 Coenagrion armatum 23 74 67 ## 17 Coenagrion hastulatum 116 930 897 ## 18 Coenagrion johanssoni 15 75 70 ## 19 Coenagrion lunulatum 42 111 104 ## 20 Coenagrion puella 113 1313 1258
We pick two species and compare their trends in number of visits where the species where reported, relative to the total number of visits.
library(dplyr)
sppCountLq <- obsData(OB) |>
group_by(year,visitUID) |>
summarise("focalCount" = sum(scientificName == "Libellula quadrimaculata"),
"sppLength" = length(unique(scientificName)),
.groups = "drop") |>
ungroup() |>
group_by(year) |>
summarise("focalCount" = sum(focalCount),
"nVis" = length(unique(visitUID)),
.groups = NULL)
sppCountLq$relCount <- sppCountLq$focalCount/sppCountLq$nVis
sppCountSd <- obsData(OB) |>
group_by(year,visitUID) |>
summarise("focalCount" = sum(scientificName == "Sympetrum sanguineum"),
"sppLength" = length(unique(scientificName)),
.groups = "drop") |>
ungroup() |>
group_by(year) |>
summarise("focalCount" = sum(focalCount),
"nVis" = length(unique(visitUID)),
.groups = NULL)
sppCountSd$relCount <- sppCountSd$focalCount/sppCountSd$nVis
oldpar <- par(no.readonly = TRUE)
par(mar=c(4,4,1,1), las=1)
plot(sppCountLq$year, sppCountLq$relCount, type = "l",
lwd = 3, xlab = "Year", ylab = "Relative number of visits with observations",
ylim=c(0, max(sppCountLq$relCount)), xaxp=c(2000, 2010, 10))
lines(sppCountSd$year, sppCountSd$relCount, lwd=3, col="#78D2EB")
legend("bottomright",
legend=c("Libellula quadrimaculata","Sympetrum sanguineum"),
text.font = 3, col = c("black", "#78D2EB"), lwd = 3, bty = "n")
par(oldpar)
Open Source also means that you can contribute. You don’t need to know how to program but every input is appreciated. Did you find something that is not working? Have suggestions for examples or text? you can always
The repositories you can contribute to are: